Picture for Jieyu Zhang

Jieyu Zhang

MolmoPoint: Better Pointing for VLMs with Grounding Tokens

Add code
Mar 30, 2026
Viaarxiv icon

URDF-Anything+: Autoregressive Articulated 3D Models Generation for Physical Simulation

Add code
Mar 14, 2026
Viaarxiv icon

Video-Based Reward Modeling for Computer-Use Agents

Add code
Mar 10, 2026
Viaarxiv icon

TrajTok: Learning Trajectory Tokens enables better Video Understanding

Add code
Feb 26, 2026
Viaarxiv icon

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

Add code
Feb 04, 2026
Viaarxiv icon

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Add code
Jan 15, 2026
Viaarxiv icon

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Add code
Dec 15, 2025
Viaarxiv icon

MolmoAct: Action Reasoning Models that can Reason in Space

Add code
Aug 12, 2025
Figure 1 for MolmoAct: Action Reasoning Models that can Reason in Space
Figure 2 for MolmoAct: Action Reasoning Models that can Reason in Space
Figure 3 for MolmoAct: Action Reasoning Models that can Reason in Space
Figure 4 for MolmoAct: Action Reasoning Models that can Reason in Space
Viaarxiv icon

CoAct-1: Computer-using Agents with Coding as Actions

Add code
Aug 05, 2025
Figure 1 for CoAct-1: Computer-using Agents with Coding as Actions
Figure 2 for CoAct-1: Computer-using Agents with Coding as Actions
Figure 3 for CoAct-1: Computer-using Agents with Coding as Actions
Figure 4 for CoAct-1: Computer-using Agents with Coding as Actions
Viaarxiv icon

Spatial Mental Modeling from Limited Views

Add code
Jun 26, 2025
Figure 1 for Spatial Mental Modeling from Limited Views
Figure 2 for Spatial Mental Modeling from Limited Views
Figure 3 for Spatial Mental Modeling from Limited Views
Figure 4 for Spatial Mental Modeling from Limited Views
Viaarxiv icon